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CLAMS 

What is claimed is: 

5 

1 . A method for determining a plurality of hypothetical matches to a spoken input, 
comprising the computer-implemented steps of: 

detecting subword units in the spoken input to generate a first set of 
hypothetical matches to the spoken input; 
1 o detecting words in the spoken input to generate a second set of 

hypothetical matches to the spoken input; and 

combining the first set of hypothetical matches with the second set of 
hypothetical matches to produce a combined set of hypothetical matches to the 
spoken input, the combined set having a predefined number of hypothetical 
15 matches. 

2. The method of Claim 1 , wherein the step of detecting subword units includes 
detecting the subword units in the spoken input based on an acoustic 

model of the subword units and a language model of the subword units; 
y< 20 generating pattern comparisons between (i) an input pattern 

2f corresponding to the subword units in the spoken input and (ii) a source set of 

reference patterns based on a pronunciation dictionary, each generated pattern 
comparison based on the input pattern and one of the reference patterns; and 

generating the first set of the hypothetical matches by sorting the source 
25 set of reference patterns based on a closeness of each reference pattern to 

correctly matching the input pattern based on an evaluation of each generated 
pattern comparison, each evaluation determining a word pronunciation distance 
measure that indicates how close each input pattern is to matching each 
reference pattern. 
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The method of Claim 1, wherein the combined set of hypothetical matches is an 
ordered list comprising a highest ranking hypothetical match in the second set of 
hypothetical matches, followed by an ordered set of hypothetical matches based 
on the first set of hypothetical matches. 

The method of Claim 1, wherein the combined set of hypothetical matches is an 
ordered list based on ranking confidence levels for each hypothetical match. 

The method of Claim 1, wherein the subword units include at least one 
phoneme. 

The method of Claim 1, wherein the hypothetical matches are words. 

A computer system for determining a plurality of hypothetical matches to a 
spoken input, comprising: 

a subword decoder for detecting subword units in the spoken input to 
generate a first set of hypothetical matches to the spoken input; 

a word decoder detecting words in the spoken input to generate a second 
set of hypothetical matches to the spoken input; and 

a list fusion module for combining the first set of hypothetical matches 
with the second set of hypothetical matches to produce a combined set of 
hypothetical matches to the spoken input, the combined set having a predefined 
number of hypothetical matches. 

The computer system of Claim 7, wherein the subword decoder 

detects the subword units in the spoken input based on an acoustic model 
of the subword units and a language model of the subword units; 

generates pattern comparisons between (i) an input pattern corresponding 
to the subword units in the spoken input and (ii) a source set of reference 
patterns based on a pronunciation dictionary, each generated pattern comparison 
based on the input pattern and one of the reference patterns; and 
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generates the first set of the hypothetical matches by sorting the source 
set of reference patterns based on a closeness of each reference pattern to 
correctly matching the input pattern based on an evaluation of each generated 
pattern comparison, each evaluation determining a word pronunciation distance 
measure that indicates how close each input pattern is to matching each 
reference pattern. 

The computer system of Claim 7, wherein the combined set of hypothetical 
matches is an ordered list comprising a highest ranking hypothetical match in 
the second set of hypothetical matches, followed by an ordered set of 
hypothetical matches based on the first set of hypothetical matches. 

The computer system of Claim 7, wherein the combined set of hypothetical 
matches is an ordered list based on ranking confidence levels for each 
hypothetical match. 

The computer system of Claim 7, wherein the subword units include at least one 
phoneme. 

The computer system of Claim 7, wherein the hypothetical matches are words. 

A computer program product comprising: 

a computer usable medium for determining a plurality of hypothetical 
matches to a spoken input; and 

a set of computer program instructions embodied on the computer 
useable medium, including instructions to: 

detect subword units in the spoken input to generate a first set of 
hypothetical matches to the spoken input; 

detect words in the spoken input to generate a second set of 
hypothetical matches to the spoken input; and 
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combine the first set of hypothetical matches with the second set 
of hypothetical matches to produce a combined set of hypothetical 
matches to the spoken input, the combined set having a predefined 
number of hypothetical matches. 

A method for determining a plurality of hypothetical matches to a spoken input 
by detecting subword units in the spoken input, comprising the computer- 
implemented steps of: 

detecting the subword units in the spoken input based on an acoustic 
model of the subword units and a language model of the subword units; 

generating pattern comparisons between (i) an input pattern 
corresponding to the subword units in the spoken input and (ii) a source set of 
reference patterns based on a pronunciation dictionary, each generated pattern 
comparison based on the input pattern and one of the reference patterns; and 

generating a set of the hypothetical matches by sorting the source set of 
reference patterns based on a closeness of each reference pattern to correctly 
matching the input pattern based on an evaluation of each generated pattern 
comparison, each evaluation detennining a word pronunciation distance 
measure that indicates how close each input pattern is to matching each 
reference pattern. 

The method of Claim 14, wherein the pattern comparisons are based on a 
confusion matrix that stores the likelihood of confusion between pairs of 
subword units, the likelihood of deleting each subword unit, and the likelihood 
of inserting each subword unit. 

The method of Claim 15, further comprising a step of training the confusion 
matrix based on an output of an subword decoder, the output produced from an 
acoustic input of a training data set input to the subword decoder. 
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17. The method of Claim 15, further comprising the step of computing the 

confusion matrix by determining an entry in the confusion matrix for each 
unique subword unit that is in the set of reference patterns. 

5 18. The method of Claim 14, wherein the step of detecting subword units is 

performed in a client computer, and the steps of generating pattern comparisons 
and generating the set of hypothetical matches are performed in a server 
computer. 

10 19. The method of Claim 14, wherein the step of generating the set of hypothetical 
matches includes: 

determining pairs of subword units by pairing an input subword unit 
. b from the input pattern with a reference subword unit from the reference pattern; 

O and 

2 1 5 providing the word pronunciation distance measure by calculating a 

J1 distance metric for each pair of subword units, the distance metric defined as 

y3 follows: 



20 



S(po,do) = 0 

S(pi - 1, dj - l) + Csubs(pi, dj) 

S{p h dj) - mi™ S{pt -\,dj) + Cdei{pi) 
S(pi,dj -i) + Cins(dj) 

S(P, D) - S(pn, dm) + LP(pn, dm) 



wherein: 

S(P,D) is a distance between word P and D; P is a given input pattern, and 
D, a given reference pattern; 

S(pi,dj) is a score of the given input pattern matching a given subword unit p t 
25 of P, and a given subword unit dj of D; 

C S ubs(Pi4j) is a cost of substituting the given subword unit #-of P with the 
given subword unit dj of D; 
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Cdei(pO is a cost of deleting the given subword unit p,- of P; 

Cins(dj) is a cost of inserting the given subword unit dj of D; 

LP(p n 4iv) is a length penalty of the given input pattern p n matching the given 
reference pattern d m , n is the length of P, and m is the length of D; 

S(puiA]-i) has a value of zero (0) if Pi-iAyi is undefined; 

S(pi-hdj) has the value of zero (0) if p uh dj is undefined; 

S(pi,dj-d has the value of zero (0) if pudj-i is undefined; and 

the distance metric for each pair of subword units is calculated in a sequence 
such that S(pi-j f dj-i), S(pi-i,dj), and S(p it dj-i) are determined previously to 
determining S(p if dj). 

The method of Claim 14, wherein the subword units include at least one 
phoneme. 

An computer system for determining a plurality of hypothetical matches to a 
spoken input by detecting subword units in the spoken input, comprising: 

a subword decoder for detecting the subword units in the spoken input 
based on an acoustic model of the subword units and a language model of the 
subword units; and 

a subword detection vocabulary look up module for generating pattern 
comparisons between (i) an input pattern corresponding to the subword units in 
the spoken input and (ii) a source set of reference patterns based on a 
pronunciation dictionary, each generated pattern comparison based on the input 
pattern and one of the reference patterns; 

the subword detection vocabulary look up module generating a set of the 
hypothetical matches by sorting the source set of reference patterns based on a 
closeness of each reference pattern to correctly matching the input pattern based 
on an evaluation of each generated pattern comparison, each evaluation 
determining a word pronunciation distance measure that indicates how close 
each input pattern is to matching each reference pattern. 
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The computer system of Claim 21, wherein the pattern comparisons are based 
on a confusion matrix that stores the likelihood of confusion between pairs of 
subword units, the likelihood of deleting each subword unit, and the likelihood 
of inserting each subword unit. 

The computer system of Claim 22, wherein the confusion matrix is trained on an 
output of the subword decoder, the output produced from an acoustic input of a 
training data set to the subword decoder. 

The computer system of Claim 22, wherein the confusion matrix is based on 
determining an entry in the confusion matrix for each unique subword unit that 
is in the set of reference patterns. 

The computer system of Claim 21, wherein the subword decoder is part of a 
client computer, and the subword detection vocabulary look up module is part of 
a server computer. 

The computer system of Claim 21, wherein the subword detection vocabulary 
look up module 

determines pairs of subword units by pairing an input subword unit from 
the input pattern with a reference subword unit from the reference pattern; and 

provides the word pronunciation distance measure by calculating a 
distance metric for each pair of subword units, the distance metric defined as 
follows: 



S(po,do) = 0 

S(pi - i, dj - 1) + Cubs(p h dj) 
S{ph di} = mhn S{pt -\,dj) + Cdei{pi) 
S(pi,dj -i) + Cms(dj) 

S(P, D) = S(j>*,dm) + LP(pn, dm) 
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wherein: 

S(P,D) is a distance between word P and D; P is a given input pattern, and 
D, a given reference pattern; 

S(pi,dj) is a score of the given input pattern matching a given subword unit p s 
of P, and a given subword unit dj of D; 

C subs (p it dj) is a cost of substituting the given subword unit p t of P with the 
given subword unit <f y - of D; 

Cdei(Pi) is a cost of deleting the given subword unit p, of P; 

Cins(dj) is a cost of inserting the given subword unit dj of D; 

LP(p n ,dm) is a length penalty of the given input pattern p n matching the given 
reference pattern d m , n is the length of P, and m is the length of D; 

8(Pi-i,dj-0 has a value of zero (0) if pi-i,dj-i is undefined; 

S(pi-],dj) has the value of zero (0) \£pui 9 dj is undefined; 

S(pi,dj~i) has the value of zero (0) if (pudj-i) is undefined; and 

the distance metric for each pair of subword units is calculated in a sequence 
such that S(pi-i,dj-i)> S(Pi-i,dj), and S(p if dj-i) are determined previously to 
determining S(pi,dj)- 

The computer system of Claim 21, wherein the subword units include at least 
one phoneme. 

A computer program product comprising: 

a computer usable medium for determining a plurality of hypothetical 
matches to a spoken input by detecting subwords in the spoken input; and 

a set of computer program instructions embodied on the computer 
useable medium, including instructions to: 

detect the subword units in the spoken input based on an acoustic 

model of the subword units and a language model of the subword units; 
generate pattern comparisons between (i) an input pattern 

corresponding to the subword units in the spoken input and (ii) a source 

set of reference patterns based on a pronunciation dictionary, each 
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generated pattern comparison based on the input pattern and one of the 
reference patterns; and 

generate a set of the hypothetical matches by sorting the source 
set of reference patterns based on a closeness of each reference pattern to 
5 correctly matching the input pattern based on an evaluation of each 

generated pattern comparison, each evaluation determining a word 
pronunciation distance measure that indicates how close each input 
pattern is to matching each reference pattern. 
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